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A classical result (often credited to Y. Medvedev) states that every language recognized by a finite 
automaton is the homomorphic image of a local language, over a much larger so-called local alphabet, 
namely the alphabet of the edges of the transition graph. Local languages are characterized by the 
value k = 2 of the sliding window width in the McNaughton and Papert's infinite hierarchy of strictly 
locally testable languages (£-slt). We generalize Medvedev's result in a new direction, studying the 
relationship between the width and the alphabetic ratio telling how much larger the local alphabet is. 
We prove that every regular language is the image of a k-slt language on an alphabet of doubled size, 
where the width logarithmically depends on the automaton size, and we exhibit regular languages for 
which any smaller alphabetic ratio is insufficient. More generally, we express the trade-off between 
alphabetic ratio and width as a mathematical relation derived from a careful encoding of the states. 
At last we mention some directions for theoretical development and application. 

1 Introduction 

A classical result [13], often credited to Y. Medvedev lfl2ll . states that every regular language is the 
homomorphic image of a local language over a larger alphabet called local. In a local language the 
sentences are characterized by three sets: the initial letters, the final letters and the set of factors of length 
k = 2. Parameter k is the width of the simplest sliding window device introduced by McNaughton and 
Papert ifTTl . The result simply derives from the fact that the set of paths in an edge-labelled graph is a 
local language over the alphabet of the edges. Considering a finite automaton for the regular language, 
the local language of accepting paths can be naturally projected on the original language. 

Our work originates from two observations. First, in the classic result the alphabet of the local 
language is larger than the source alphabet, by a multiplicative factor, to be called the alphabetic ratio, in 
the order of the square of the number of states. The simplicity of sliding window machines and languages 
is very attractive, but the huge size of the local alphabet in Medvedev theorem makes their application 
impractical. 

Then a natural question concerns the local alphabet in the classical result: how small can the alpha- 
betic ratio be? A small alphabet may, for instance, allow to encode messages from a regular language 
into an sit language, to be transmitted over a communication channel, so that a more economical sliding 
window receiver can be used instead of a general finite state machine. 

Second, the local languages are a member of McNaughton and Papert's IfTTl infinite hierarchy of 
k-strictly locally testable, for short &-slt, languages. Then, by considering /c-slt, instead of just 2-slt i.e., 
local languages, we raise a more general question: what is the minimum alphabetic ratio such that, for 
some finite parameter k, every regular language is the alphabetic homomorphism of a &-slt language? 
In that case, how big does the width parameter k need to be? More precisely, our main result, which 
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generalizes Medvedev theorem, expresses the trade-off between two parameters: the alphabetic ratio and 
the width. 

We spend a few lines to show the early but enduring interest for subfamilies of regular languages 
characterized by some form of local testability, without entering into details. 

At the basis of formal language theory, the classical theorem of N. Chomsky and M.P. Schutzenberger 
characterizes context-free languages by a homomorphism applied to the intersection of a Dyck language 
and a 2-slt one. Several similar characterizations for other language families have later been proved. In 
mathematics, the sit languages have been applied in the theory of semigroups by A. De Luca and A. 
Restivo [lj. In linguistics, a persistent idea is that natural languages can be modeled, at various levels, by 
locally testable properties. For instance, the psychologist W. Wickelgren |[T4l made the observation that 
the set of English words are essentially a 3-slt (finite) language, and several brain scientists (in particular 
V. Braitenberg [3]) have suggested that sequences of finite length, such as the factors occurring in a 
locally testable language, can be easily stored and recognized by certain neural circuits (in particular the 
synfrre chains of M. Abeles) that have been observed in the cortex. In computational linguistics locally 
testable definitions have proved to be useful at various levels of finite-state models. Many persons (e.g. 
[7]) working on language learning models have been attracted by the efficiency of learning algorithms 
for various types of locally testable languages. Contemporary comparative work on the aural pattern 
recognition cababilities of humans and animals ifTOll have called attention to the subregular hierarchies 
induced by local testability. In mathematical biology, in his seminal article on language theory and DNA 
ll9l . T. Head shows that certain splicing languages are precisely the sit languages. 

The paper is organized as follows. After the basic definitions in Section[2| we introduce in Section|3]a 
new classification of regular languages based on their homomorphic characterization via a fc-slt language 
over an alphabet of size m. In Section[3]we prove a lower bound on the alphabetic ratio. In Section[4]we 
state and demonstrate a generalization of Medvedev theorem, including a mathematical analysis of the 
relationship between language complexity, alphabetic ratio, and width. The Conclusion presents an open 
problem and mentions conceivable developments and applications of the main result. 

2 Preliminaries 

The empty word is denoted by e. The terminal alphabet of the source language is denoted by A. For 
simplicity we deal only with languages in A + , which do not contain the empty word. The cardinality of 
an alphabet will be called the arity; the arity of a language is the arity of its alphabet. 

A nondeterministic finite automaton (NFA) M is a quintuple M = (Q,A,E,qo,F) where Q is a finite 
set of states, A is a finite alphabet, the transition relation (or graph) is E C Q x A x Q, qo G Q is the 
initial state; F C Q is the set of final states, which does not contain qo (since only £-free languages are 
considered). 

Two transitions (p,a,q) and (p',a',q') are consecutive if q = p'. A path r\ = eoe\ ...e n -\ is a finite 
sequence of n > consecutive transitions eo = (po,ao,pi), e\ = (p\,a\,p2), e n -\ = (/?„_i,a n _i,/?„). 
The origin of r\ is o(t]) = po, its end is e{r\) = p„, and its label is l(rj) = a§a\ . ..a n -\. A successful path 
is a path with origin qo and end in F. The language recognized by M, denoted L(M), is the set of labels 
of all successful paths of M. 

We assume, without loss of generality, that the transition relation is total, i.e., for every q *E Q,a £ A, 
set {p £ Q \ (q,a,p) G E} ^ (if E is not total, just add a new sink state to Q). 

Given another finite alphabet B, an (alphabetic) homomorphism is a mapping 7t : B — > A. For a 
language L' CB + , its (homomorphic) image under % is the language L = {%(%) \ x G L'}. 
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For every word w G A + , for every k>2, let i^w) and fy(w) denote the prefix and, respectively, the 
suffix of w of length k if |w| > k, or w itself if \w\ <k. Let /jt(w) denote the set of factors of w of 
length k. Extend ik,tk,fk to languages as usual, i.e., i k (L) = {z'/t(w) | w G L}, ^(L) = {t k {w) | w G L], 
and = UweL-ZUw)- A factor of a word w starting at position k and ending at position h, with 
1 < h,k < \w\, is defined as follows: 



Hence, for h>k, \st.h(w)\ = h — k+l. 

Definition 1. A language L is k-strictly locally testabl^ shortly k-slt, if there exist finite sets 4_i , T^-i C 
A k ~ l and F k C A k such that, for every x G A k A*, the following condition holds: 



A language is strictly locally testable (sit) if it is k-slt for some k to be called the width. 

This definition ignores words shorter than k — 1 , which however can be checked directly against a 
finite set, if needed. The case k = 2 corresponds to the very well known family of local languages (see 
for instance lfX3Tl or (21). The following example will be referred to later. 

Example 1. The language L' = (a'a) + U (b'b) + is 2-slt, i.e., local, since it can be defined by the sets 
I x = { a ',b'}, Ti = {a,b}, F 2 = {a'a,b'b,aa f ,bb'}. 

It is known and straightforward to prove that the family of sit languages is strictly included in the 
family of regular languages, and it is an infinite strict hierarchy ordered by the width value. For in- 
stance, the language L/, = (ab h ) + on A = {a,b}, with h > 1 a constant, is (h + l)-slt, but it is not 
h-slt In fact, L h is defined by the sets: I h = {ab h ~ 1 }, T h = {b h }, F h+l = {Watf 1 - 1 \ 0<i<h}. How- 
ever, Lh is not /i-slt: consider the words ab h G L/, and ab h+l G" L/,: if,-i(ab h ) = ih-\{ab h+l ) = ab h ~ 2 , 
t h ^(ab h ) = t h ^{ab h+l ) = fe*" 1 , f h (ab h ) = {ab h - l ,b h } = f h (ab h+l ). Hence, the two words above can- 
not be distinguished by using width h. 

3 Lower Bounds 

As said, every regular language, to be referred to as source, is the image of a 2-slt language whose arity 
may be much larger than the arity of the source. To talk precisely about the width of the sit language and 
of the ratio of the arities of the sit and source languages, we introduce a definition. 
Definition 2. For k > 2,m > 1, a language L C A + is (m,&)-homomorphic if there exist an alphabet B 
(called local) of arity m, a k-slt language U C B + , and a homomorphism K : B — > A such that L = 7l(L'). 

Clearly, if L C A + is k-slt then L is trivially (|A| , &)-homomorphic. Otherwise, a local alphabet larger 
than A is needed. For instance, the language L = (aa) + U (bb) + is not sit but the language L' = (a'a) + U 
(b'b) + of Ex. [I] is 2-slt. By defining % : {a, a', b, b'} — > {a,b} as n(a) = 7i(a') = a, 7l(b) = 7i(b') = b, 
then L = 7t(L') and hence L is (4,2)-homomorphic. The alphabetic ratio of L' and L is 4/2 = 2. 

The traditional construction (e.g. in |T3l ) of a 2-slt language L' considers an NFA (Q,A,E,I,F) of 
size n = \Q\ for L, and uses set E as local alphabet, i.e., up to n 2 • \A\ elements. Hence we can restate 
Medvedev's property saying that every regular language on A is (n 2 ■ |A| , 2)-homomorphic (the alphabetic 
ratio is n 2 ). However, it is straightforward to show that the arity of the local alphabet can be reduced to 



'The original name in 1111 is "fc-testable in the strict sense". This concept should not be confused with other language 
families based on local tests, see |4) for a recent account. 




x G L 



i k -\{x) G 4_i A^_i(x) G T k _i Af k (x) C F k 



n-\A\. 
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Proposition 1. Every regular language, accepted by an NFA with n states, is {n ■ \A\ , 2)-homomorphic. 

Proof. Let M = (Q,A,E,q ,F) be an NFA. Define two mappings % : Q x A — > A and p : Q x A x Q — > 
Q xA such that n((q,a)) = a, for every a G A, q G Q and p(p,a,q) = (p,a) for every p,q G 2, a G A. 
The following sets define a 2-slt language L 1 C {Q x A) + : 

/i = {(<?o,a) | a G A}; 

F 2 = {(q,a)(q,b) \ a,b£A,q,q G Q,(q,a,q) G £}; 
7\ = {(q,a) | a G A, 3</ G F : (q,a,q') G i?}. 

We show first that 7r(Z/) C L. Let w G k(L'). Hence, there exists x G Z/ such that 7r(x) = w. We claim 
that there exists a successful path tj of M such that x = p{r\). Let n = \w\. Since x G Z/, there exist 
qi,q 2 ,...q n -i G 2, a ,«i,- • • ,a n -i G A such that x= {q ,ao){qi,ai) . . . (q n ^i,a n ), and w = a a\ . . .a n -\. 
Since (q„-i,a n ) G T\, there exists ^ G F such that (q n -\,a n -i,q) G £. Let T] be (^o,«o,^i) (^1,^1,^2) 
. . . (<7„_i,a M _i,g): Tj has label w, origin in qo and end in a final state; moreover, p (77 ) = x. By definition 
of F2> every factor (^,_i,a,)(^,-,a i+ i) of jc, for 1 < j < n, must be such that (qi-i,ai,qi) G £, hence all 
transitions of r\ are consecutive, i.e., r\ is a successful path of label w. 
We show that L C 7t{L'). Let w G L be accepted by a successful path T] of M of the form 

(<7o,ao,<7i)(gi,ai,<72) • • • (ln-i,an-i,qn), 

with q n ^F and ao • . = w. We claim that p{r\) G L'. In fact, ii(p(r])) = (qo,ao) G /1, ?i (pC 7 ?)) = 
(q„-l,a n ) G Ti and f 2 {p(rj)) = {(<?,•_ | 1 < i < n}. Since each (q^iai^qi) G £" (being a 
transition of 77), fi{p{r])) Q F 2 . □ 

A natural question to be later addressed, is whether, by allowing the width k to be larger than 2, it 
is possible to reduce the arity of the local alphabet to less than n ■ \A\. Next we prove the simple, but 
perhaps unexpected result, that the local alphabet cannot be smaller than twice the size of the source one. 

Theorem 1. For every alphabet A, there exists a regular language L C A + that is not (2 • |A| — 1,£)- 
homomorphic, for every k>2. 

Proof. Let L be defined by the regular expression \J aeA (aa)*. By contradiction, assume that there exist 
k > 2 and a local alphabet B of arity 2|A| — 1, a mapping % : B — > A and a k-slt language L' C B + such 
that Tt(L') = L. Since |S| = 2 • |A| — 1, there exists at least one symbol of A, say, a, such that there 
is only one symbol b G B such that 71(b) = a. Since the word a 2k G L, there exists x G L' such that 
k{x) = a 2k . By definition of n and of B, x = b 2k . Consider the word xb = b 2k+l . Clearly, n(xb) = a 2k+l , 
which is not in L, since all words in L have even length. Hence, xb L' . But (x) = 4_i(^) = b k ~ l , 
tk-i(x) = h-\(xb) = b k ~ y , fk{x) = fk(xb) = b k and, by Definition [l] xb is in L', a contradiction. □ 

The same result holds (with a very similar proof) if in the statement the class of strictly locally 
testable languages is replaced by the class of locally testable language^] The question whether an 
alphabetic ratio of two is sufficient is addressed in the next section. 



2 They are the boolean closure of sit languages, see 1111 . 
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4 Main Result 

The intuitive idea that by increasing the width one can use a smaller alphabet for the sit language, is 
studied in detail. Our approach consists of defining an sit language using a larger alphabet that encodes 
the states traversed by the original automaton into words of fixed length. Our main theorem states the 
relationship between the language complexity in terms of number of states, the alphabetic ratio, and the 
width of the sit language. 

Theorem 2. If a language L C A + is accepted by a NFA with n > 1 states, then for every h>2,L is 
(h\A\, 0({p)^ -homomorphic. 

The rest of the section is devoted to the proof. Special care is devoted to find a very succinct encoding 
of the original states into strings of the local alphabet, in order to reach the minimal alphabetic ratio. 
Since it may be important for applications, our encoding produces also a small, although not optimal, 
width of the sit language. The proofs are organized so that the main lemmas hold, independently of the 
chosen encoding, which only affects the numerical results. This organization has the advantage that the 
proof is essentially unaffected by the encoding. 

The next definitions set the base for stating the properties a good encoding should have. Only fixed- 
length encodings are considered. Let D be a finite alphabet. Let M = (Q,A,E,qo,F) be a NFA, where E 
is total, and let n = \Q\ > 1. 

Given an integer m > \lg\o\ ( I Q\ )] , a code of Q into D of length m is a mapping [ ] : Q — > D m such that 
for every p,q G Q, if p / q then \p\ ^ [q\. Consider a word x that is a factor of [Q + ]. We want to decode 
x to one state. This will be useful when defining a sit language whose homomorphic image is L(M). If 
\x\ > 2m, since x may include the concatenation of [q] and [p], q,p G Q, it is not decodable to just one 
state symbol; moreover, if \x\ < 2m — 1 then x may not contain any factor of the form [q]. However, if \x\ 
is exactly 2m — 1 , then the word is bound to include at least one factor of the form [q] , for some q G Q, 
which can be decoded to q. In addition, we want this decoding to be unique. 

The traditional notion of decodability (for every xjG Q + , if [x] = [y] then x = y) is not adequate, 
since it assumes that the word to be decoded is a string in [Q + ], while we need to consider a. factor of 
[Q + ]. A word x G D 2m ~ l is said to be factor-decodable if there exists one, and only one, position j, 
1 < j <m—l, such that there exists q G Q: sjj +m (x) = [q\. A code [ ] : Q — > D m is factor-decodable if 
every word in f2m-\{[Q + ]) is factor-decodable. 

Lemma 1. For all finite alphabets Q,D of cardinalities n = \Q\ andh = \D\, with n > 2, 2 < h < n, there 
exists a factor-decodable code of Q into D of length m = \g(h) +f(h) lg 2 «] > 3, with: 

f(h) = lg- 1 (h - 1 + V(/*-l)(/* + 3)) - 1 
gW = l + M( lg2 ( A _i) + i g2 ( A + 3 )). 

Sketch of the proof. Let G D be a symbol. The idea is to let code [ ] be such that for every q G Q, 
[q] ends with the word 00, i.e., s m -i :in ([q]) = 00 and there is no other occurence of 00 in [q]. Formally, 
for every i, 1 < i < m — 1, if = 00 then i = m — 1. This is enough for factor-decodability. To 

find how large m must be as a function of h and n, first consider, for every m>2, the set S(m) of words 
in D m such that x G S(m) if x has suffix 00 and in x there is no other occurrence of 00. If \S(m)\ > n, 
then it is possible to assign a distinct word in S(m) to every state of Q. The definition of S(m) is by 
induction on m > 2. S(2) = {00}, i.e., the only word in S(2) is 00. S(3) = {d00 \deD- {0}}. Given 
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sets S(m — l),S(m — 2), let S(m) be: 

{dy \ deD— {0},y G S(m — 1)} U {Odx \ d G D - {0}, y G 5(m - 2)} . 

Hence, |5(2)| = 1, |5(3)| = A - 1 and 

\S(m)\ = (/z-l)|5(m-l)| + (/z-l)|5(m-2)|. 

This recurrence relation is strictly connected to the so-called Lucas sequence U m (P, Q), where P, £^]are 
integers (see, e.g, p. 395 of [6]): U x (P, = 1, U 2 (P, Q) = P, and for m > 3, U m (P, Q) = PU m ^(P, Q) - 
QU m -2(P,Q)- For P = l,Q = — 1 this is just a Fibonacci sequence. If P 2 — AQ > 0, a closed-form 

solution for every m > is U m (P,Q) = ' » where c? = P+v/ ^ ~ 4g ,fc = f ^ AQ . With standard 
algebraic manipulations and by defining f(h), g(h) as in the statement of the Lemma, one can derive that: 
|5(m)| > n is satined if m = +f(h)lg 2 n\ 

Remark 1. Both f(h) and g(h) are monotonically decreasing with h, although very slowly for large h, 
with lim/ WO o/(/i)lg 2 /i = 1, lim/,_ ) .oog(/j) = 2. w/f/i, moreover, < /(/i) ;$ 1.44, 2 < ;$ 4.11. 77ze 
expression for m is O f definition of a code, m cannot be smaller then m m i n = [j^]. m w 

^ (ip)' hence the code of Lemma^is asymptotically optimal. In particular, the ratio m/m m i„, where 
m is computed by the above formula, is dominated by term f(h) lg 2 h ^ 1 .44, which is very close to 1 for 
h>3. Hence, no encoding can significantly improve f(h) (or g(h)), decreasing m/m m i„. A few examples 
of approximated values for f(h), g(h), and f(h)lg 2 h are: 



h 


2 


3 


4 


10 


100 


1000 


f(h) 


1.44 


0.68 


0.52 


0.29 


0.15 


0.10 


8(h) 


4.11 


2.92 


2.66 


2.34 


2.15 


2.10 


f(h)lg 2 h 


1.44 


1.09 


1.04 


1.00 


1.00 


1.00 



To prove Th. [2j a few more definitions are required. Define the following alphabetic homomor- 
phisms: a :A xD — > A, 8 . AxD — > D are such that a(a,d) = a, 8(a,d) = d for every a G A,d G D. 
A path of M of length t > is called a ?-path. Paths r\\ , T]2, ■ • . , f]k of M, k > 2, are called consecutive if 
771*72 • • • ilk is also a path of M (i.e, e(r\h) = o(t]/, + i), for all 1 < h < k — 1). With an abuse of notation, 
let [ ] : (Q x A x Q)* — > (A x D)* be defined on paths as follows. Let rj be a £-pafh. If t = then [rj] = e; 
if 1 <t <m, let [77] be the unique word z in (Ax D) m such that a(z) = Z(tj), 8(z) = it([o(r])]) (i.e., 
8(z) = [o(tj)] if 77 is a m-path). 

If \r\\ > m, then there exist a unique k > 1 and a unique < j <m — 1 such that |tj| = km + j; hence, 
there exist k+l consecutive paths of M, denoted by 771, 772, . . . ,T\ki r\k+\ such that rj = T7i 172 • • • TlkWic+u 
each rjh, i < h < fc, is a ra-path and r\k+\ is a 7-path. This decomposition in consecutive paths is called 
the canonical decomposition of T7. Then, [rj] is defined as [tJi] [TJ2] • • • [tlk] [Hk+i]- 
Let L' be the 2»i-slt language defined by the following sets: 

/2m- 1 = *2m-i ({[^V] I n'* 7 ?" are consecutive m-paths of M A 5([tj']) = [90]}) ; 
F~2m = fim ({ [?7 W"] I n") l'" are consecutive m-paths of M}) ; 
?2m-i = ^-l^D?' 7 ?" 7 ?'"] I 77', T7", n'" are consecutive paths of M, \ r\'\ = |tj"| = m, 
0<|i?"'| <m^(T7 ,, T7 ,,, )GF}). 

The proof of the following lemma follows from uniqueness of factor-decodability: 
3 Beware that Q is not the set of states. 
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Lemma 2. Let [ ] : Q — > D' n be a factor-decodable code. For all z G ^2 OT , there exist a position j, 
1 < j < m — 1, ant/ ftvo consecutive paths Tji , tj 2 ofM such that: 

1. T]i is a m-path, and t 2m -j + \ (z) = [T7i] [T72],' 

2. for any two consecutive paths of M, 177,17/7, if Tj/ is a m-path and [r\f\ is a suffix of z then 
[t]i] = [tj/] and [tj 2 ] = [tj//]; 

5. if 8{i m {z)) = [q] for some q G Q, then j = 1, 772 is a m-path and o(r\\) = q; 

4. ift Zm -\{z) E ?2 m - 1, then e(TjiTj 2 ) E F. 

Lemma 3. There exists a finite language L" C A + such that a(L')UL" = L(M). 

Sketch of the Proof Let L" be the set of words in L(M) of length less than 3m. 

Part (I): (L(Af) - L") C a(L(M'). Assume that x G L(Af), |x| > 3m. To show that there exists a 
successful path 77 of M such that Z(t/) = x, we first claim the following result for every path, whether 
successful or not: 

(*) for all paths T7 of M, with | T7 1 > 3m,f 2m ([ri]) C F 2m . 

The proof of (*) is on induction on the the canonical decomposition of 77. Part (I) can now be com- 
pleted. For all x G L{M) —L", let 77 be a successful path of M with l(r\) =x; moreover, let 771, . . . , 77^, 77^+1 
be the canonical decomposition of 77. By (*), fimil 7 ])) — ^2m- But 77 is successful: 0(77) = 0(771) = qo, 
hence i2M_i([Tj]) = i 2m _i([7]i][7]2]) €/2m_i;e(Tj) G F, hence t 2m -\ ([tj]) = t 2m -i ([^-l^^/t+i]) G T 2m -\. 
Therefore, [rj] G L'. 

Part (II): tt(L') C L(M). The proof needs the assumption that code [ ] is factor-decodable. The 
following property can be proved by induction on k > 2, by applying Lemma |2j 

(+) for all words z G (A x D) + , |z| > 2m, if f 2m (z) C F2„, and i 2m -i(z) G I 2m -i then there 
exists a path 77 of M such that z, = [t]] and 0(17) = qo. 

The proof of Part (II) follows from (+). In fact, if x G cc(L'), with \x\ > 3m, then there exists z G L' such 
that x = a(z). Since in this case / 2m _i (z) G I 2m - 1, / 2m (z) C F 2m , f 2m -i (z) G T 2m _i, by (+) there exists a 
path 77 of M with origin in qo and such that z = [77]. Let T]i,Tj2, • • • , f]k, Vk+i be the canonical decomposi- 
tion of 77, with |t]| =km+j, k> 3 andO < j <m—l (hence |7]jt+i| = i)- Let w = ?2w-i([''7fc-i]['7*][''7*+i]) 
and consider ? 2m _i(z) = *2m-i(fa]) = ?2m-i(fe-i]fe]fe4-i]) = w. Apply Lemma|2| Part (1), to w G 
F 2m ,w G T 2m -\. Hence, there exist a position h and consecutive il',ri", with 77' a m-path, such that 
hm-h{w) = [tj'] [77"]. Since [tj^] [tj^+i] (of length m + j < 2m — 1) is also a suffix of w, by Part (2) of 
Lemma[2| [tjJ = [tj'], [r/t+i] = [tj"]. Since o{r\k) = o(tj'), also paths Tj,t-i,Tj' are consecutive. Hence, 
z=ln} = \n\---nk-\nknk+\] = \ni--- i nk-\][r]k i nk+\] = [TJi.-.Tj^ilfTj'Tj"] = [tji.-.tj^^'tj"]. There- 
fore, path Tji . . . tj^-iTj'tj" has label z, origin qo, and end ^(tj'tj") in F, i.e., it is successful: x G L. 

The proof of Th.|2]is now immediate. By Lemmas [T]and|3}m= \g(h) +f(h) lg 2 n] , and L' is 2m-slt. 

. A few examples of 



Hence, L is (2\A\,2\g{h) + f(h)lg 2 n])-slt, with 2\g{h) +f{h)\g 2 n\ being O ^ 
width for various values of number n of states and alphabetic ratio h are shown here. 
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From Regular to Strictly Locally Testable Languages 



Hence, by enlarging the local alphabet, a smaller width suffices to construct the sit language. How- 
ever, it is useless to take an alphabetic ratio h>n, since in this case one can use the simpler construction 
of Prop. [T] To finish, we note that for many regular languages one can obtain a homomorphic definition 
that uses lower values of alphabetic ratio and/or width than those obtained by the main theorem. 

5 Conclusion 

We have generalized Medvedev's homomorphic characterization of regular languages: instead of using 
as generator a local language over a large alphabet, which depends on the complexity of the regular 
language, we can use a strictly locally testable language over a smaller alphabet that does not depend on 
complexity, but just on the source alphabet. We have proved that the smallest alphabet one can use in the 
generator is the double of the alphabet of the regular language; thus, for instance, four symbols suffice 
to homomorphicaily generate any regular binary language. 

In the main proof we have offered a specific and fairly optimized construction of the strictly locally 
testable language, for which we have derived the relationship between the width, the alphabetic ratio, and 
the complexity of the regular language. In our opinion, the construction should be of its own interest, 
as a new technique for simulating a NFA by means of a larger, yet strictly locally testable, machine. 
Our encoding is asymptotically optimal with respect to language complexity, and remains very close to 
the theoretical optimum for finite values of complexity. But it is an open technical question whether a 
different construction would yield better values for the alphabetic ratio and the width parameter. 

Applications and developments of our result are conceivable in areas where a language characteriza- 
tion a la Medvedev has been found valuable, as in the next ones. 

Picture languages. A main family of 2-dimensional languages, the tiling systems JSJ, is defined by a 
2-dimensional Medvedev characterization. Does our result extend to 2D languages? 
Context-free languages. Combining our result with the Chomsky-Schutzenberger theorem it should be 
possible to obtain non-erasing homomorphic characterizations using a small alphabet. 
Consensual languages [5]. This generalization of finite-state machines motivated by modelling tightly 
connected concurrent computations uses homomorphism between words as its core mechanism. 
Information transmission for reducing the receiver cost was already mentioned in the introduction. 
Acknowledgments: Thanks to Aldo De Luca for suggesting relevant references. 
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